Modeling protein evolution with several amino acid replacement matrices depending on site rates.

نویسندگان

  • Si Quang Le
  • Cuong Cao Dang
  • Olivier Gascuel
چکیده

Most protein substitution models use a single amino acid replacement matrix summarizing the biochemical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors that influence the substitution patterns. In this paper, we investigate the use of different substitution matrices for different site evolutionary rates. Indeed, the variability of evolutionary rates corresponds to one of the most apparent heterogeneity factors among sites, and there is no reason to assume that the substitution patterns remain identical regardless of the evolutionary rate. We first introduce LG4M, which is composed of four matrices, each corresponding to one discrete gamma rate category (of four). These matrices differ in their amino acid equilibrium distributions and in their exchangeabilities, contrary to the standard gamma model where only the global rate differs from one category to another. Next, we present LG4X, which also uses four different matrices, but leaves aside the gamma distribution and follows a distribution-free scheme for the site rates. All these matrices are estimated from a very large alignment database, and our two models are tested using a large sample of independent alignments. Detailed analysis of resulting matrices and models shows the complexity of amino acid substitutions and the advantage of flexible models such as LG4M and LG4X. Both significantly outperform single-matrix models, providing gains of dozens to hundreds of log-likelihood units for most data sets. LG4X obtains substantial gains compared with LG4M, thanks to its distribution-free scheme for site rates. Since LG4M and LG4X display such advantages but require the same memory space and have comparable running times to standard models, we believe that LG4M and LG4X are relevant alternatives to single replacement matrices. Our models, data, and software are available from http://www.atgc-montpellier.fr/models/lg4x.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Biases in Amino Acid Replacement Matrices and Alignment Scores Due to Rate Heterogeniety

Empirically derived amino acid replacement matrices are widely used in sequence comparison and database searches. We consider an extension of the usual Markov process model of protein evolution that admits site to site rate heterogeneity and demonstrates that rate heterogeneity can introduce a bias in estimated replacement probabilities and the corresponding alignment scores derived from these ...

متن کامل

Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations.

Evolutionary amino acid replacement rates depend on local structural environment (Overington et al. 1992; Koshi and Goldstein 1995). Recent models of protein evolution aim to take such site heterogeneity into account by using site-specific amino acid replacement matrices (Lio and Goldman 1998; Thorne 2000). This is a difficult task, mainly because of a lack-of-data problem: too many sequences w...

متن کامل

Phylogenetic mixture models for proteins.

Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a singl...

متن کامل

A combined empirical and mechanistic codon model.

The evolutionary selection forces acting on a protein are commonly inferred using evolutionary codon models by contrasting the rate of synonymous to nonsynonymous substitutions. Most widely used models are based on theoretical assumptions and ignore the empirical observation that distinct amino acids differ in their replacement rates. In this paper, we develop a general method that allows assim...

متن کامل

Effect of solvent extracted soybean meal and full-fat soya on the protein and amino acid digestibility and body amino acid composition in rainbow trout (Oncorhynchus mykiss)

 This study was carried out to investigate the apparent digestibility coefficients (ADCs) value of protein, amino acid and energy and body amino acid composition of rainbow trout fed solvent extracted soybean meal (SBM) and full-fat soybean meal (FFS) partly replacing fish meal (FM) in diets. Five iso nitrogenous (average 50.36% crude protein) and energetic (4294 kcal/kg total energy) diets wer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 29 10  شماره 

صفحات  -

تاریخ انتشار 2012